Shotgun Metagenomic Data Analysis ◾ 313
-r taxonomic_level \
-o kaiju_output/ERR1823608_table.tsv \
kaiju_output/ERR1823608.out \
-l taxonomic,levels,separated,by,commas
Run “kaiju2table” to learn about the usage and options of this command.
Most taxonomy classifiers of the metagenomic data follow the same steps: the database
downloading and classification. For almost all of them, these steps require large storage
space and memory that may not be available on the regular desktop computers. However,
if we do not have enough computational resources, we can use Centrifuge which requires
relatively small storage space and memory that fits personal computers.
Centrifuge classifier is available at “https://github.com/infphilo/centrifuge”. For the
updated installation instructions, visit that site. Up to this day, you can install it on Linux
using the following commands:
git clone https://github.com/infphilo/centrifuge
cd centrifuge
make
sudo make install prefix=/usr/local
If it has been installed successfully, no need to do anything else but to use it from any
directory. Run “centrifuge -h” to display the usage and options.
As usual, to use Centrifuge classifier, we will begin by building an index. There are
several ready-to-use indexes available at http://www.ccb.jhu.edu/software/centrifuge.
However, Centrifuge also needs sequence and taxonomy files and sequence ID. That can be
simplified by using “make” command that can build several standard and custom indices.
To do that, find the Centrifuge directory and change into “indices” directory and then run
the “make” command as follows:
cd indices
make p+h+v
# bacterial, human, and viral genomes [~12G]
make p_compressed # bacterial genomes compressed at the species
level [~4.2G]
make p_compressed+h+v
# combination of the two above [~8G]
This command will download the reference taxonomy files and reference genome at assem-
bly levels. The download may take a while depending on the speed of the Internet connec-
tion. It is also easier to download a database from Centrifuge homepage, which is available
at “https://ccb.jhu.edu/software/centrifuge/manual.shtml”. Centrifuge is used to assign
taxa to the short reads in the FASTQ files. For the “-x” option, make sure that you provide
the database name with the path if it is not in the current path.
mkdir centrifuge_out
centrifuge -x p+h+v \